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In large scale multiple testing, the use of an empirical null distri- 
bution rather than the theoretical null distribution can be critical for 
correct inference. This paper proposes a "mode matching" method for 
fitting an empirical null when the theoretical null belongs to any ex- 
ponential family. Based on the central matching method for ^-scores, 
mode matching estimates the null density by fitting an appropriate 
exponential family to the histogram of the test statistics by Pois- 
son regression in a region surrounding the mode. The empirical null 
estimate is then used to estimate local and tail false discovery rate 
(FDR) for inference. Delta-method covariance formulas and approx- 
imate asymptotic bias formulas are provided, as well as simulation 
studies of the effect of the tuning parameters of the procedure on 
the bias-variance trade-off. The standard FDR estimates are found 
to be biased down at the far tails. Correlation between test statis- 
tics is taken into account in the covariance estimates, providing a 
generalization of Efron's "wing function" for exponential families. 
Applications with statistics are shown in a family-based genome- 
wide association study from the Framingham Heart Study and an 
anatomical brain imaging study of dyslexia in children. 

1. Introduction. In large-scale multiple testing problems, the observed 
distribution of the test statistics often does not accurately match the the- 
oretical nuh distribution [Efron et al. (2001), Efron (2004, 2005b)]. In such 
cases, the use of an empirical null distribution, estimated from the data it- 
self, can be critical for making correct inferences. Previous empirical null 
methods [Efron (2004, 2007b), Jin and Cai (2007), Efron (2008)] have fo- 
cused on situations where the theoretical distribution of the test statis- 
tics is A^(0, 1) or t, typically found, for example, in two-group microarray 
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gene expression studies. Other large-scale multiple testing problems present 
theoretical null distributions that are not normal or t. For instance, 
tests are commonplace in the analysis of genome-wide association studies 
based on single nucleotide polymorphisms (SNPs) [Van Steen et al. (2005), 
Kong, Pu and Park (2006)], while multivariate F tests appear in voxel-based 
analyses of brain imaging studies [Everitt and Bullmore (1999), Schwartz- 
man, Dougherty and Taylor (2005), Lee et al. (2007), Schwartzman et al. 
(2008b, 2008a)]. 

This paper extends the scope of the empirical null to distributions that 
belong to general exponential families, treating the normal and x^j as well as 
their counterparts t and F, as special cases. This extension allows the empir- 
ical null to be flexibly chosen as a parametric exponential family version of 
the theoretical null. For example, where the theoretical null A^(0, 1) may be 
replaced by an empirical null N{^,a'^) with arbitrary mean fi and variance 
o"^, a theoretical null x^(^o) with fixed i^q degrees of freedom may be replaced 
by a scaled density (i.e., gamma) with arbitrary scaling factor a and arbi- 
trary number of degrees of freedom [Schwartzman, Dougherty and Taylor 
(2008a)]. 

As a first data example, consider the following family-based study of 
genome-wide association between genetic variants and obesity based on 
the Framingham Heart Study (FHS) [Herbert et al. (2006)]. Briefly, genetic 
markers were obtained by genotyping 1400 probands from the family-plates 
on an Affymetrix lOOK SNP-chip containing 116,204 SNPs. Each SNP was 
tested for association with four body- mass index measurements at exams 1, 
2, 3 and 4 using the multivariate FBAT-GEE statistic [Lange et al. (2003)]. 
Excluding SNPs for which the number of informative families was less than 
20, a total of 95,810 test statistics were generated with theoretical null x^(4). 
Figure 1(a) shows that the histogram of the test statistics is not as well 
matched by the theoretical null x^(4) (see zoom-in) as by the empirical null, 
a scaled with 4.27 d.f. and scaling factor 0.95. The mismatch between the 
histogram and the theoretical null can be seen better in the p-value scale 
in Figure 1(b). The histogram of p- values according to the empirical null is 
closer to a uniform distribution than that according to the theoretical null. 

A second example where the effect is more dramatic is the brain imag- 
ing study analyzed in Schwartzman, Dougherty and Taylor (2008a). In brief, 
diffusion tensor imaging (DTI) scans were taken of 6 dyslexic and 6 nondyslexic 
children. After spatial registration, at each of 20,931 voxels a directional test 
statistic was computed for testing whether the first eigenvector of the mean 
diffusion tensor has the same 3D spatial orientation in both groups. The 
scores for each voxel were obtained by a quantile transformation from the 
theoretical null model -F(2,20) to x^(2)- Figure 2(a) shows a histogram of 
the 20,931 x^"Scores. The data histogram is not well matched by the theo- 
retical null x^(2) but is better described by the empirical null, a with 1.82 
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d.f. This is better seen in Figure 2(b). For p- values that are most Hkely null, 
say, higher than 0.1, the theoretical null produces a histogram that can be 
hardly explained by a uniform distribution. In contrast, the empirical null 
produces a histogram that is mostly uniform in that range. Moreover, the 
number of voxels with low p-values (less than 0.05) is higher according to 
the empirical null, indicating a gain in statistical power. Schwartzman et al. 
(2008b) show other examples of voxel-based analyses in brain imaging with 
normal and statistics where the empirical null is necessary for correct 
inference. 

The proposed method for fitting the empirical null, which I call 'mode 
matching', is a generalization of the central matching method for z-scores 
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Fig. 1. SNP example: (a) Histogram of the test statistics (light gray). Superimposed den- 
sities are the theoretical null (dashed) and the empirical null (solid) with pointwise 
standard 95% CIs. The histogram of the estimated alternative component and correspond- 
ing upper standard CI are shown in inverted scale. Inlet plot is a zoom-in. (b) Histogram 
of p-values according to the theoretical null (light gray) and the empirical null (black). 




Fig. 2. DTI example: (a) Histogram of the scores (light gray). Superimposed densities 
are the theoretical null x^(2) (dashed) and the empirical null (solid) with pointwise stan- 
dard 95% CIs. The histogram of the estimated alternative component and corresponding 
upper standard CI are shown in inverted scale, (b) Histogram of p-values according to the 
theoretical null (light gray) and the empirical null (black). 
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[Efron et al. (2001), Efron (2004, 2007b)]. Mode matching consists of fitting 
the empirical null to a region of the histogram of the test statistics surround- 
ing the mode, which for the normal distribution coincides with matching the 
center. Mode matching is presented here with a one-step approach, fitting 
the empirical null to the histogram directly by Poisson regression. This con- 
trasts with the two-step scheme of Efron (2007b), where a nonparametric 
density is first fitted to the histogram by Poisson regression and then the em- 
pirical null is fitted to the nonparametric density estimate by least squares. 
The one-step fit simplifies the theoretical analysis of bias and variance and 
avoids the need to tune additional parameters for nonparametric density 
estimation. But, most importantly, it highlights why mode matching is ef- 
fective for exponential families: for these the log-link function of the Poisson 
regression becomes linear in the regression parameters. 

The empirical null may be used with any multiple testing procedure. 
Nonetheless, mode matching is particularly suited for estimating the false 
discovery rate (FDR), a commonly used error measure in multiple test- 
ing problems [Benjamini and Hochberg (1995), Genovese and Wasserman 
(2004), Storey, Taylor and Siegmund (2004)]. Below I present formulas for 
calculating the local and tail FDR estimates and show that, as with central 
matching [Efron (2005b, 2007b)], these estimates follow easily from mode 
matching calculations for general exponential families. 

Delta method covariance formulas are derived for both the empirical null 
and FDR estimates. It is shown that these formulas produce variance esti- 
mates similar to those obtained by the bootstrap when the test statistics 
are independent. Permutations are used to respect the correlation between 
test statistics when they are not independent. 

Further, approximate formulas are derived for the bias of both the em- 
pirical null and FDR estimates. It is shown that the bias in the empirical 
null is driven mainly by the likelihood ratio between the alternative and null 
distributions. Simulations are used to inform the choice of the two tuning 
parameters of mode matching (histogram bin width and fitting interval) in 
terms of the bias-variance trade-off. For example, in agreement with Efron 
(2007b), it is found that in the normal case mode matching is fairly insen- 
sitive to the choice of bin width, but in the case the choice of bin width 
is affected by the curvature of the density, which sharply increases when 
the number of degrees of freedom is less than 2. The fitting interval plays 
a more important role, as it controls the bias introduced by the alternative 
distribution. In terms of FDR estimation, the bias formulas reveal that both 
the local and tail FDR estimates can be deceptively biased down for very 
high thresholds (low p- values), where the number of observed test statistics 
is low. I argue that this effect should be carefully taken into account when 
making inferences in real data sets. 
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The effect of dependence in the covariance of the empirical null and FDR 
estimates is explained in terms of the empirical distribution of pairwise cor- 
relation between test statistics in a way similar to Efron (2007a). I show 
that Efron's enigmatic "wing function" is a special case of the large family 
of Lancaster polynomials of bivariate exponential families, which reduces to 
the Hermite polynomials in the normal case and to the Laguerre polynomials 
in the case. 

Mode matching is both computationally efficient and easy to implement 
because it is based on Poisson regression, for which software is widely avail- 
able. The analysis is demonstrated in both the DTI and SNP examples 
introduced above. The SNP example demonstrates the bias, while the DTI 
example demonstrates the effect of correlation. While both examples have 
null distributions, I emphasize that the methodology is designed for general 
exponential families. Specific procedures and simulation results are shown 
for both the normal and ^ cases. 

2. Mode matching for exponential families. 

2.1. Setup. Let Ti, . . . ,T/v be a large collection of N test statistics. The 
two-class mixture model [Efron et al. (2001), Storey (2003), Efron (2004, 
2007b) Sun and Cai (2007)] 

(1) f{t)=PoMt) + {l-po)fA{t) 

specifies that a fixed fraction po of the test statistics behave according to a 
common null distribution with density /o (t) . The other test statistics behave 
according to alternative densities whose mixture is /^(t). The null density 
fo{t) is assumed unimodal. The zero assumption, needed for identifiability 
of the model, is loosely defined by Efron as the condition that most of the 
probability mass near the mode of f{t) is due to the null term pofoit), for 
example, po > 0.9 (the effect of overlap between the null and alternative 
components is discussed in Section 4). The objective of the empirical null 
methodology is to estimate po and /o from Ti , . . . , T/v . 

Mode matching begins by summarizing the data into a vector of histogram 
counts y = (yi, . . . ,yi^)' with yk = Eili HTi G B^}, k = I, . . . , K, ior K bins 
Bk centered at t = {ti, . . . ,1^)' ■ For simplicity, I assume all bins have the 
same width A, although this is not crucial. If the test statistics are inde- 
pendent, then, given N, the counts y follow a multinomial distribution with 
probabilities tt = (tti, . . . , ttk)' , T^k = P{Ti £ Bk). By the Taylor expansion 
around tk, 

(2) vTfc = / fit) dt = Afitk) + ^f"{tk) + • • • « A/(tfc). 

JBk 24 
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The approximation is vahd if the bin width A is small and the marginal 
density f{t) is smooth (the effect of curvature is discussed in Section 4). 
Thus, for large N, the scaled histogram 

(3) 

is a nearly unbiased estimate of f{t) at the bin centers t. 

The next step is to choose a closed interval Sq where the zero assumption 
may hold. Sq is the union of Kq < K consecutive bins containing the mode of 
f{t). For example, for a two-sided test with theoretical null A^(0, 1), 5*0 may 
be of the form 5*0 = [tmin,imax], while for a one-sided test with theoretical 
null , Sq may be of the form = [0, tmax]- Within ^o, the zero assumption 
makes (3) an estimate of the scaled null Po/o(^) ™ (1)) with additional bias 
(l-po)/A(t). 

Suppose /o(i) is a parametric density. Instead of maximizing the multino- 
mial likelihood given mode matching uses, almost equivalently, Poisson 
regression. The idea, also called Lindsey's method [Efron and Tibshirani 
(1996), Efron (2007b)], is to consider the number of tests as a Poisson 
variable N ~ Po{'j). If the test statistics are independent, then the histogram 
counts become independent Poisson variables yk ~ Po{Xk) with = 77rfc. 
If N is large, this is essentially the same as the usual Poisson approxi- 
mation to the multinomial. Using (2), we have = 77rfc ~ ^Af{tk)- Thus, 
within Sq, the zero assumption leads to the general Poisson regression model 
yk ~ Po(Afc) with 

(4) Afc 7Apo/o(tfc), tk^So, 
where 7 is replaced by its MLE, the observed count N. 

2.2. Exponential families. Since the link function for Poisson regression 
is logarithmic, the precise parametric form of /o(t) needed to make log(Afc) 
in (4) linear in the parameters is an exponential family. Let 

(5) /o(t) = <7o(i)exp(a3(t)'77-V(r?)), 

where gQ{t) is the carrier density, t] is the vector of canonical parameters, 
x(t) is the sufficient vector and V'(^) is the cumulant generating function. 
Replacing in (4) gives the linear Poisson regression model y^ ~ Po(Afc) with 

(6) log{Xk) =x{tkyv + C + hk, 
where the entries of x{tk) play the role of predictors, 

(7) C = C{r]) = logpQ-^P{rJ) 

is a constant intercept, and = \og{N AgQ{tk)) is an offset. It is convenient 
to write model (6) in vector form as 

(8) log(A) = Xr]+ + h, 
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where A = {Xi, . . . , Xk)' , r]^ = {C,ri'y is the augmented parameter vec- 
tor, the design matrix X has rows (l,a;(tfc)') for k = 1, . . . ,K , and h = 
{hi, . . . , hx)' ■ The fit is restricted to the interval 5*0 by providing the Pois- 
son regression algorithm with an external set of weights w = {wi, . . . jWk)' , 
where Wk is equal to 1 or according to whether tk is in or not. For 
later use, define the diagonal matrix W with diagonal equal to w (not to be 
confused with the weighting matrix used internally in the iterative solving 
of the Poisson regression). 

Solving (8) gives estimates r)^ = {C,rf)\ which include the empirical null 
parameter estimates t). From these, an estimate of the null probability pQ 
is also obtained using (7) as po = exp(C' + ^(^)). Notice that pq is not 
constrained to be less than or equal to 1. The predicted histogram counts 
A = A^A/o(t) = y = (yi, . . . , corresponding to the empirical null for all 
bins (not just within ^o) are 

(9) y = exp{Xr]+ + h). 

As a result, the predicted histogram counts corresponding to the alternative 
component in (1) are 

(10) NA{1 - po)fA{t) = NA{f{t) - pofoit)) = y-y. 

Empirical null densities are more naturally specified using the usual pa- 
rameters of the distribution rather than the canonical ones. When the the- 
oretical null is A^(0, 1), the empirical null is N{ii,a'^) with 9 = (^,cr^)' 
[Efron (2004, 2007b)] [t-statistics are handled by a quantile transformation 
to A'^(0, 1)]. When the theoretical null is with uq d.f., an appropriate em- 
pirical null is a scaled with u d.f. and scaling factor a, denoted ax^{v), 
with density 

ill) fJt) = i ~t/{2a)^u/2-l 

where = {a,u)' [Schwartzman, Dougherty and Taylor (2008a)]. This is the 
same as a gamma density with shape parameter u/2 and scaling parameter 
2a, but using the notation helps keep the connection to the theoretical 
null. F-statistics are handled by a quantile transformation to with the 
same numerator number of degrees of freedom. 

Let 6 = 6[r}) denote the vector of usual parameters as in the normal 
and examples above. Let 0^ = ilogpo^O')' be the augmented parame- 
ter vector. The MLE of 6^ is 6^ = {\ogpQ,9{f])')' . The derivation of these 
parameter estimates from the canonical parameter estimates for both the 
normal and cases is worked out in Appendix A. 

Other distributions are treated in a similar way. For p- values, whose the- 
oretical null is uniform, the empirical null may be a beta distribution with 
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fitting interval Sq = [tmmA]- If the theoretical null is a discrete exponen- 
tial family (e.g., binomial, Poisson, negative binomial), the mode matching 
procedure is the same as above except that the bins width A = 1 is auto- 
matically set by the discrete nature of the distribution, making equations 
(2) and (4) exact rather than approximate. 

2.3. Exponential subfamilies. In some cases, one may want to adjust 
only some of the parameters in (5) and leave the others fixed as prescribed 
by the theoretical null. For instance, the microarray analysis examples in 
Efron (2007b) suggest the empirical null A^(0,cj^), while in some fMRI 
studies involving z-scores, an appropriate empirical null may be N{fj,, 1) 
[Ghahremani and Taylor (2005)]. If fixing some parameters results in an- 
other lower dimensional exponential family, then the procedure is similar 
to the one above after the canonical parameters have been redefined. Let 
?7+ = {C,T][,r]2y , where rji is the vector of canonical parameters to be es- 
timated and is the vector of parameters whose values are fixed. Let 
X = {1k,Xi,X2) be the corresponding split of the design matrix, where 
Ik indicates a column of K ones. The regression equation (8) becomes 

(12) \0g{X) = lKC + XiT]^ + {X2V2 + h) 

and is solved as before, except that the fixed term -X^2^2 is absorbed into 
the offset in parenthesis. The specific exponential subfamilies of the normal 
and cases are worked out in detail in Appendix A. 

The simplest restricted case is where one believes the theoretical null and 
no adjustment of parameters is necessary, except for po [Efron (2004)]. In 
that case, only the intercept C needs to be estimated in (12), treating all the 
other terms as offset. The estimate of po is then given by po = exp{C + tp{r])). 
Notice that, for the regression (12) to remain linear, pQ cannot be fixed a 
priori. 

2.4. Covariance estimates. Covariance estimates for the empirical null 
parameter estimates r)"*" can be obtained by the delta method in a way 
similar to Efron (2005b). For this we first need an estimate of the covariance 
of y. As noted by Efron and Tibshirani (1996), there are two such estimates. 
The Poisson regression regards the observations yk as independent, so its 
infiuence function is determined by the covariance estimate cdv{y) = V = 
Diag(^), a K X K diagonal matrix with diagonal entries y^. On the other 
hand, the true covariance of y depends on the dependence structure of the 
test statistics. 

Suppose first the test statistics are independent. Then conditional on N, 
the i/k are multinomial, for which an appropriate covariance estimate is 

(13) Vn = Diag(y) - yy'/N. 
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Proposition 1. Let ip{f)) and 9{'fj) denote the derivatives of if) and 6 
with respect to ij evaluated at i). The delta method covariance estimates of 
ff^ and 0^ are respectively 

(14) c5v(r)+) = {x'wvxy^x'wvNWx{x'wvxy^ 

(15) aw(0+) = Z)aw(r)+)Z)', = Q f^^^^. 

Proposition 2. The delta method covariance estimate of the empirical 
null fits (9) and the empirical alternative component (10) are respectively 
c5v{y) = iVDy)VN{VDyy and c5v(y - y) = {I - VDy)VN{I - VDy)', 
where Dy = dilogy) / dy' is given by 

(16) Dy = x{x'wvxy^x'w. 

The above covariance estimates become more accurate as N increases. 

If the test statistics are mildly correlated, the Poisson regression scheme 
may still be used to fit the empirical null, but the covariance estimates need 
to change. In this case, the delta-method covariance formulas in Propositions 
1 and 2 are applied with replaced by a covariance estimate other than 
(13) that reflects the correlation between the bin counts. This is illustrated 
below in the DTI example using permutations. Alternatively, one may fit the 
empirical null including an over dispersion parameter in the Poisson regres- 
sion. The overdispersion parameter (f> is estimated by the quasi-likelihood 
MLE (p = {1/ K)J2k=iiyk ~ ^k)/^k- The Poisson regression fit is the same 
as before, but the covariance estimates above are inflated by a factor (j). 

2.5. The SNP data. Recall the SNP data set described in Section 1. 
The histogram in Figure 1(a) was constructed using bins of width A = 0.1 
starting from zero. The empirical null was obtained using mode matching 
(Appendix A. 2). The fitting interval was defined as 5*0 = [0, 20], wide enough 
to use most of the data without including the far tail region t > 20, where 
discoveries are likely to be made. These choices are discussed in Section 4. 

The estimated parameters 0^ are listed in Table 1. Assuming indepen- 
dence of the test statistics, the associated standard errors (SE) were com- 
puted as the square root of the diagonal of (15) using the multinomial covari- 
ance (13). For comparison, I also used the bootstrap as follows. Again assum- 
ing independence, repeated resampling with replacement from {Ti, . . . ,r/v} 
gave sets {T^*, . . . , T^}, each leading to a parameter estimate (6^)*. The 
bootstrap covariance estimate of 6^ was computed as the empirical covari- 
ance of the (6^)* and the SEs as the square roots of the diagonal elements of 
this covariance. Notice that the delta-method SEs are only slightly smaller 
than the bootstrap SEs. 
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The CIs for a and z> do not include the theoretical values 1 and 4, 
indicating a significant departure from the theoretical null in both scal- 
ing and degrees of freedom. The CI for log(po) includes 0. This does not 
prove that there are no significant SNPs, but it shows that the study may 
not have enough power to discover them. The lower bound of the CI for 
log(po) suggests that the fraction of non-null SNPs may be as high as 
l-po -log(po) = 3.18 X 10-^ which is about 3 SNPs out of iV = 95,810. 
If instead of fitting the full empirical null, pq is estimated alone believing 
the theoretical null, the result is log(po) = 1-24 x 10~^ with standard CI 
[1.96 X 10-^,2.28 X 10"'']. The theoretical null does not admit an estimate 
of po that is less than 1, again indicating that the theoretical null is unsuit- 
able for this data. 

The SEs in Table 1 are smaller than they would be if the dependence be- 
tween the test statistics were taken into account. I did not take into account 
the dependence because I did not have access to the original data but only 
to the FBAT test statistics. Given the complexity of the FBAT procedure, 
pairwise correlations between test statistics would be hard to estimate. It 
has been claimed that the SNPs in this dataset are not highly correlated 
because of their widespread locations on the genome [Herbert et al. (2006)]. 
One indication that the correlation may not have a large effect is that the 
estimated over dispersion from fitting the empirical null is (j) = 1.090 (boot- 
strap SE = 0.066), not significantly larger than 1. 

2.6. The DTI data. For the DTI data set, the histogram in Figure 2(a) 
was constructed using bins of width A = 0.05 starting from zero and the 
empirical null was obtained using mode matching (Appendix A. 2). The 
fitting interval was defined as 5o = [0,4.5]. These choices are discussed in 
Section 4. 

The estimated parameters 0^ are listed in Table 2. The associated SEs 
were computed as the square root of the diagonal of (15) using two different 
values for Vn- The naive estimate (column 4) assumes independence of 
the test statistics and uses the multinomial covariance (13). The estimate in 

Table 1 

SNP example: Theoretical null parameters (column 2) and empirical null estimates 
(column 3). Included are delta-method and bootstrap SE (columns 4 o,nd 5) and standard 
95% confidence intervals based on the delta-method SE (column 6) 



e+ 


Theory 


0+ 


SE (15) 


Bootstrap SE 


95% CI 


log(po) 





7.49 X 10"^ 


5.45 X 10"^ 


5.89x10^5 


[-3.18,18.2] X lO"'' 


a 


1 


0.9509 


0.0046 


0.0048 


[0.9419,0.9600] 


V 


4 


4.2675 


0.0183 


0.0199 


[4.2316,4.3034] 
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column 5 replaces V^at by a permutation estimate V p similar to the one used 
by Efron (2007a), obtained as follows. In the original data, the group labels 
of the two groups of 6 subjects were permuted, for a total of 924/2 = 462 
distinct permutations (the test statistics are symmetric, yielding the same 
value if the groups are swapped). For each permutation, the 20,931 test 
statistics T* were recomputed and a vector y* of histogram counts was 
produced. The permutation covariance estimate Vp was computed as the 
empirical covariance of the y* . 

This permutation scheme relies on the subjects being independent but 
preserves the correlation structure between the test statistics. While validity 
of the permutations requires the complete null assumption, removal of the 
mean effects is difficult in this case because of the directional nature of 
the data. Yet, since pq is close to 1, the null hypothesis is valid in most 
voxels and may be enough for the purposes of estimating global parameters, 
such as those of the empirical null. Table 2 shows that taking into account 
correlation via the permutation scheme about doubles the naive SEs. 

Based on the permutation SEs, the CI for a includes the theoretical value 
1, while the CI for v does not include the theoretical value 2. This indicates a 
significant departure from the theoretical null in degrees of freedom but not 
scaling. The CI for po does not include 1 and it is estimated that there are 
(1 —p())N = 745 non-null voxels in the data. The estimated over dispersion is 
<j) = 1.332 (permutation SE = 0.087), significantly larger than 1 as expected 
from the dependence. 

If instead of fitting the full empirical null, po is estimated alone believing 
the theoretical null (blue curve in Figure 2), the result is po = 0.989 with 
naive CI [0.984,0.994] and permutation CI [0.955,1.024]. The theoretical 
null not only provides a poor fit to the data but gives a higher estimate of 
Po than the empirical null, so one may say it is less powerful. 

3. FDR inference. 

Table 2 

DTI example: Theoretical null parameters (column 2) and empirical null estimates 
(column 3). Included are delta-method SEs assuming independence (column 4) and using 
permutations (column 5). The standard 95% confidence intervals are based on the 
permutation SEs (column 6) 



9+ 


Theory 


0+ 


SE (15), ind 


SE (15), perm 


95% CI, perm 


Po 


1 


0.964 


0.0042 


0.0080 


[0.949,0.981] 


a 


1 


0.961 


0.0225 


0.0679 


[0.828,1.094] 


V 


2 


1.817 


0.0223 


0.0377 


[1.743,1.891] 
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3.1. FDR estimates. Mode matching is particularly convenient for FDR 
estimation, as FDR estimates follow immediately from the Poisson regression 
fits (9). Let F(t) =poFQ(t) + (1 —po)FAit) be the cumulative version of (1). 
Recall that the local false discovery rate (fdr) and (positive) right-tail FDR 
are given respectively by 

fT7\ f+\ Pofojt) f.^ Po{l-Fo{t)) /f°°fdr(n)/(M) du 

^'^^ "^^^^^'^^ I -Fit) = irfNdu 

[Efron et al. (2001), Efron (2004)]. Using (3) and (4), the local fdr at the 
bin centers tk is estimated by 

(18) fdr(4) = MM = ^ 
^ ^ ^ ' f{tk) yk/{NA) 
defined whenever > 0, or in vector form as 

(19) logfdr = logi) — log?/, 

where y is given by (9). In contrast to Efron (2007b), I estimate Fdr^j at 
the bin centers based on the right side of (17) by 

, (l/2)fdr(tfc)/fa)+Ef=fe+ifdr(t,)/(t,) 

(l/2)m.)+Ef=fc+i/(t,) 

(20) 

(i/2)yfe + Ef= fc+1 Vj 
where (3) and (18) were used. This can be written in vector form as 

(21) logFdr^ = log(5y) - \og{Sy), 

where S is an upper triangular matrix with entries 1/2 on the diagonal and 
1 above the diagonal. The estimate (21) is easy to analyze theoretically in 
terms of bias (see Section 4). For the left tail FDR, definition (17) is changed 
to FdrL(t) = poFo{t)/ F{t) and is estimated similarly by 

(22) log Fdr^ = log{S'y) - log{S'y) , 

where S' is the transpose of S, a lower triangular matrix with entries 1/2 
on the diagonal and 1 below the diagonal. 

Proposition 3. (a) The delta method covariance estimate of the local 
fdr (19) is cOT(logftii-) = AVnA', where A = d{\ogM!r) / dy' = Dy - V'^ , 
V = Diag(t/) and Dy is given by (16). 

(b) The delta method covariance estimate of the right tail FDR (21) 

is cOT(logFdrR) = BVnB', where B = dilogYdrR) / dy' = U'^SVDy - 
and U = Diag{Sy) , U = Diag{Sy) . The formula for the left tail FDR 
(22) has the same form with S replaced by S' . 
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Fig. 3. DTI example: Right tail FDR estimates using both the theoretical (blue) and the 
empirical null (red). Included are pointwise standard 95% CIs using the naive multinomial 
covariance estimate (dashed) and the permutation covariance estimate (dotted). 



As with the empirical null covariance estimates of Section 2.4, the FDR 
covariance estimates above become more accurate as N increases. 

3.2. The DTI data. Figure 3 shows the right tail FDR for the DTI data, 
computed by exponentiating (21). The CIs were computed by exponentiating 
the CIs for logEdr^ , based on SEs equal to the square roots of the diagonal 
elements of the delta- method covariance estimates given by Proposition 3(b). 
The narrow CIs correspond to the naive multinomial covariance (13), while 
the wide CIs correspond to the permutation covariance Vp described in 
Section 2.6. In Schwartzman, Dougherty and Taylor (2008a) it was claimed 
that the empirical null was more powerful because the FDR curve is always 
below the FDR curve for the theoretical null, yielding lower thresholds. 
However, Figure 3 shows that when the dependence is taken into account 
by using the permutation covariance, the empirical null FDR curve has too 
much variance. While the estimate of po found before indicates that there 
are non-null voxels in this data, the variance of the FDR estimates makes 
the results inconclusive. 

I defer the FDR analysis of the SNP data to after the following discussion 
on bias. 

4. Tuning parameters and bias. Mode matching is controlled by two 
tuning parameters, the bin width A and the fitting interval Sq. Each is 
connected to a different source of bias: the bin width A controls the bias 
incurred by using a first order approximation in (2); the fitting interval Sq 
controls the bias incurred by the inclusion of the alternative component 
(1 — Po)/a in the fit of the empirical null. In what follows, I refer to the 
following two simulation scenarios of model (1): 

(9-1) T '"'^ / = A^(0.2, 1.2^), probability po, 
^ ^ \ fA = iV(3, 1.22), probability l-po, 
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(24) 7^. iJLdr/o = 0.8x^(3), probability po, 

* \ = noncentralx^(3, 5 = 3), probability 1 — 

where 6 = 3 denotes the noncentrality parameter. The fitting interval is set 
to So = [0.2 — to, 0.2 + to] in the normal case and 5o = [0,to] in the case, 
so that in both cases Sq is tuned by the single number to- 



4.1. The bin width. The first and smallest source of bias is the use of a 
first order approximation in (2). Under the zero assumption, the error may 
be approximated by the next expansion term 

(25) ^,_/o(t,)Ap.:^A3, tkeSo. 

As in nonparametric density estimation, bias is reduced by thinning the bins. 
However, the size of the bias depends also on the curvature of the empirical 
null. If fo is N{fi,a'^), the largest curvature occurs at the mode where 
/o (a*) ~ -0.4/cj^. Thus, the error (25) is bounded by O.OUA^/a^. This is 
about 1.3 X 10~^ if A = 0.2a. If /o is ax^(i^), the curvature depends strongly 
on z/. For v < 6 except z/ = 2, the curvature is unbounded at t = 0, but it 
decreases rapidly as t increases away from 0. For u >2, more important is 
the curvature at the mode, where most of the probability mass lays. This 
curvature decreases rapidly as u increases away from 2. For example, for 
u = 3, the curvature at the mode u — 2 is /q (z^ — 2) ~ — 0.12/a^ so the error 
(25) is bounded by 0.005AVa^- This is about 4 x 10~^ if A = 0.2a. 

The effect of A is illustrated in Figure 4. The plotted empirical null es- 
timates are averages over 100 simulated instances of models (23) and (24) 
with pq = 1, N = 10,000, and fixed to = 1 in the normal case and to = 4 in 
the case. In contrast to nonparametric density estimation, the variance 
is remarkably insensitive to A. Within the plotted range, the variance is 
smallest for the smallest value of A. This is because, for fixed Sq, a small 
A implies a large number of bins K = \Sq\/A and thus a large number of 
design points for the Poisson regression. Therefore, one may want the small- 
est possible A as long as the number of counts within each bin in Sq 
remains large, say, in the hundreds. Of course, this depends on the number 
of tests N. A larger N allows a smaller A. The caveat is computational. 
A large number of bins K implies inverting large matrices in the fitting of 
the empirical null. For N = 10,000, based on Figure 4 and computational 
considerations, A = 0.05 ~ 0.1 seems a reasonable choice for the normal and 
X^(z^) with 1/ > 2. If 1/ < 2, the curvature near t = demands much smaller 
values of A to avoid substantial bias. For large the increase in the ef- 
fective support of the density may require increasing A in order to reduce 
computations. 



EMPIRICAL NULL FOR EXPONENTIAL FAMILIES 



15 



1.4 

1.2 
t 

I O.S 

|e.B 

0.4 



10" 



0%312 o.Qess 0.1 2S 
Oete 



0.0312 O.Ce^S 0.1^5 
b Delta 




Fig. 4. Effect of the bin width A. Top panels: normal simulation, (a) Estimates of po 
(black solid), ^ (lower gray solid) and a (upper gray solid). The thick dashed lines are 
simulated ±2 standard errors, very close to the values predicted by formula (15). The thin 
dashed lines indicate the true values po — l,fiQ = 0.2, ao = 1.2. (b) Simulated MSE for po 
(black), p, (solid gray) and a (dashed gray). Bottom panels: simulation, (c) Estimates 
of Po (black solid), a (lower gray solid) and v (upper gray solid). The thick dashed lines 
indicate simulated ±2 standard errors, very close to the values predicted by the formula 
(15). The thin dashed lines indicate the true values po = 1, ao = 0.8, fo ~ 3. (d) Simulated 
MSE for po (black), a (solid gray) and u (dashed gray). 



4.2. The fitting interval. The largest source of bias in the estimation of 
the nuh density is the inclusion of the alternative component (1 — po)fA 
within Sq. The asymptotic bias for large is quantified in the following 
proposition. 



oo 



Proposition 4. Define the vectors /q = fo(t), = /yi(t), and let r), 
and denote the deterministic limits of the estimators r)"*" and 6^ as 
A^— >oo. The asymptotic biases of i)^ and 6^ are given approximately by 

, , 7)+ -T7+«(l-po)(X'WDiag(/o)X)-ix'l^(/^-/o) 
(26) 

-(log(po),0')' 
and 0+ — 0+ f« D{f]^ — r]~^), where D is given by (15). 



Roughly, the asymptotic bias (26) is proportional to the likelihood ratio 
between fA and /o within Sq, but modified by the Poisson regression. For 
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Table 3 

Asymptotic bias in the estimation of 6^ ■ The limit 6'^ = (^(flto) ^o,^ computed from the 
solution to the limiting score equation (38) 



Null 




9+ 


Sao 


et -6+ 


Formula (26) 


Normal 


log(po) 


= -0.1054 


-0.0801 


0.0253 


0.0289 






= 0.2 


0.2265 


0.0265 


0.0250 






= 1.44 


1.4907 


0.0507 


0.0480 




log(po) 


= -0.1054 


-0.0331 


0.0723 


0.0749 




a 


= 0.8 


0.8449 


0.0449 


0.0457 




V 


= 3 


2.9789 


-0.0211 


-0.0210 



fixed A, the fitting interval controls the number of columns of X . Obviously 
the bias is zero if po = 1. The approximation (26) is valid for po close to 1. 
Its accuracy is shown in Table 3 for po = 0.9. 

Figure 5 shows the empirical null parameter estimates averaged over 100 
instances of the simulations (23) and (24) with pq = 0.9 and A'^ = 10,000. 
The simulations were repeated for varying to and fixed A = 0.1. Increasing 
to increases the bias due to the inclusion of the alternative component. On 
the other hand, increasing to also increases the number of design points 
for the Poisson regression, reducing variance. The bias is worse in the 
simulation because the null and the alternative densities overlap more than 
in the normal simulation, even though they have a similar separation in 
their mean. All parameters except the d.f. u of the tend to be biased 
upward. This implies that the empirical null is conservative, predicting a 
smaller contribution of the alternative density in the mixture than there is. 

The choice of to is more difficult than that of A. Ideally, one may want 
the largest to that does not result in a substantial bias. However, the bias 
depends on po and on the alternative density /A(i), both of which are un- 
known. The MSE plots in Figure 5 show that in the normal simulation, 
the optimal to is in the range 1.4 ~ 1.9, corresponding to about 1.2 ~ 1.6 
standard deviations of the true null A^(0.2, 1.2^). Notice that the optimal 
to is not the same for all the parameters. In the simulation, the optimal 
to is in the range 3.2 ~ 6, corresponding to the 74 ~ 94 percentiles of the 
true null 0.8x^(3). In the DTI example (Section 2.6), I chose to = 4.5, which 
corresponds roughly to the 89.5 percentile of the x^(2) distribution. In the 
SNP example (Section 2.5), po is extremely close to 1, allowing to to be 
much larger. There I used to = 20, which is the 99.95 percentile of the x^(4) 
distribution. 

4.3. Bias in FDR estimation. One issue overlooked by Efron (2007b) is 
that the local fdr estimate (18) is biased by definition. The bias is given by 
the following proposition. 
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Fig. 5. Effect of the fitting interval width to. Top panels: Normal simulation, (a) Esti- 
mates of po (black solid), fi (lower gray solid) and a (upper gray solid). The thick dashed 
lines indicate simulated ±2 standard errors, very close to the values predicted by the for- 
mula (15). The thin dashed lines indicate the true values po = 0.9, no = 0.2, ctq = 1.2. 
(b) Simulated MSE for po (black), fi (solid gray) and a (dashed gray). Bottom panels: 
simulation, (c) Estimates of po (black solid), a (lower gray solid) and v (upper gray 
solid). The thick dashed lines indicate simulated ±2 standard errors, very close to the val- 
ues predicted by the formula (15). The thin dashed lines indicate the true values po = 0.9, 
ao = 0.8, vo = 3. Notice the vertical scale is different from panel (a), (d) Simulated MSE 
for Po (black), a (solid gray) and v (dashed gray). 



Proposition 5. (a) /f po o.'^i fo ci'^e known, then under the Poisson 
model (4), E[fdrfc|?/fc > 0] = fdifc Cih), where 

(27) c(A) = ^r-r/ du. 

(b) If Po and /o are estimated by mode matching, then, for large N and 
tk ^ 'S'o; E[fdrfc|yfc > 0] ~ fdrfc6fcC(Afc), where is the kth entry of the asymp- 
totic bias vector b^o = exp{X {fj^ — iT^)) with the inner parenthesis given 
by (26). 

The bias factor (27) appears because for the local fdr denominator y^ ~ 
Po{Xk), is biased for 1/Afc. Figure 6(a) shows a plot of the function C('^)) 
closely related to the so-called exponential integral [Abramowitz and Stegun 
(1966), Chapter 5]. Since increases with 7 (i.e., A^), most bins fall on the 
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Fig. 6. FDR bias in the normal simulation, (a) The proportional FDR bias function 
C(A). In plots (b)-(f), the black solid Ime is the simulated average estimate, the thick 
dashed lines are 5 and 95 percentiles of the simulation, and the thin dashed line is the true 
value, (b) Local fdr using theoretical null, (c) Local fdr using theoretical null (zoom in). 
(d) Local fdr using empirical null, (e) Right tail FDR using theoretical null, (f ) Right tail 
FDR using empirical null. 

right end of the plot, making the local fdr estimate in most bins slightly 
conservatively biased above the correct value. However, Afc is always small 
in the far tails of the null density /o, making the FDR estimate biased down. 
This can be misleading when the true FDR is close to 1. 

Figure 6(b) shows the local fdr estimates for 100 instances of the normal 
simulation (23) with known po = 0.9 and N = 10,000. Because the alternative 
density sits on the right side of the plot, the true local fdr at the left tail is 1. 
The local fdr estimate, however, follows the graph of C(-^)) as the bin counts 
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get smaller toward the left, with the variance proportional to that graph. 
Any particular realization of the FDR curve here may give the impression 
that there is something to discover at the left tail, while in reality there is 
not. The zoom-in in panel c shows that the phenomenon still occurs in the 
right tail, as the average FDR estimate is first biased up with higher variance 
and then dips below the truth as the bin counts get very small. This is not 
noticeable in panel b because, by Proposition 5, the bias is proportional to 
the true FDR, which is low in this region. When the empirical null is used, 
the additional bias and variance are visible in panel d. The additional bias 
is captured by the factor bj. of Proposition 5(b), where the approximation is 
the result of using the asymptotic bias for large N derived from Proposition 
4. Notice in Figure 6(d) that the bias is up, so the FDR estimates are 
conservative. 

The tail FDR also suffers from a similar bias phenomenon, being sensitive 
to small cumulative bin counts in the far tails. Following a similar argu- 
ment as in Proposition 5(a), when po and /o are known, (20) says Fdry^ = 
{SXo)k/ {Sy)k, where the cumulative denominator yk/2 + J2iLk+i Vi behaves 
similarly to a Poisson random variable with mean {SX)k = ^kf^ + ^iLk+i ^i- 
Therefore, E[Fdrk\yk >0]= Fdr^ ■E[{SX)k/{Sy)k\(,Sy)k > 0], where the con- 
ditional expectation behaves approximately like ^[(5^)^]. A simulation us- 
ing po = 1 (not shown) gives FDR curves that are very similar to those on 
the left end of panels b and d. In Figure 6(e) (zoom- in not shown), the bias 
is visible but small because the FDR itself is low in the right tail. Panel f 
shows the increase in bias and variance when the empirical null is used. 

The bias phenomenon does not contradict the results of Storey, Taylor 
and Siegmund (2004), which claim asymptotic unbiasedness of the tail FDR 
estimator. Consider a fixed bin k. As N increases, the expected bin count 
increases and the operating point in Figure 6(a) moves to the right, making 
the FDR estimate asymptotically unbiased. The CW phenomenon appeals 
to practical cases where is large but finite, so that the bin counts at the 
tails are still small. 

Efron (2004, 2007b) circumvented the bias problem by smoothing the 
histogram with a spline fit. This helps because the compounding of data at 
each bin pushes the operating point in the C(A) graph [Figure 6(a)] to the 
right. However, smoothing introduces a bias of its own. Simulations using 
smoothing show that the resulting estimates at the far tails, especially when 
Pq = 1, are not reliable, as they are very sensitive to the choice of knots or 
smoothing bandwidth. 

4.4. The SNP data. The FDR analysis is summarized in Figure 7. Here I 
focus on the local fdr, which in this case is more powerful than the tail FDR. 
The observed local fdr estimates (18) are compared to their expected value 
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Fig. 7. S'A'P example: local fdr estimates using the empirical null. Top panel: full range. 
Bottom panel: zoom-in including standard 95% CIs. In both panels the dashed line is the 
expectation of the local fdr estimate under the complete null. 



under the complete null, computed by replacing for in Proposition 
5 and setting fdr^ = 1 and bk = 1- In agreement with Proposition 5, the 
expected local fdr estimate goes down at the far tails where the number 
of counts is small. Three bins stand out with observed local fdr estimates 
significantly below the dashed line. The results for these bins are reported in 
Table 4. While the observed local fdr values are relatively low, they turn out 
to be not as low when their bias is taken into account. The adjusted local 
fdr values in Table 4 are not precise at correcting the bias, but they hint 
that about 50% of the 8 SNPs contained in these 3 bins might be associated 
with obesity. This result agrees with the assessment of Section 2.5 that the 
nonnull distribution may contain about 3 SNPs. 

5. The effect of correlation. When the theoretical null is A^(0, 1), Efron 
(2007a) showed that the over dispersion in the covariance of the empirical null 
with respect to the multinomial covariance can be described as a multiple 
of a "wing function," where the multiplying factor captures the variance of 
the pairwise correlations between the test statistics. This result extends to 
exponential families as follows. 



Table 4 

SNP example: Bins with local fdr significantly below the expected local fdr under the 
complete null. Column 3 contains standard 95% CIs based on the delta-method SE. 
Column 4 is the observed local fdr from column 2 divided by the expected local fdr for the 
complete null at that bin. Column 5 is the number of SNPs in the bin 



tk 


fdrfe 


95% CI 


Adjusted fdrfe 


# SNPs 


21.65 


0.2831 


[0.1546,0.5184] 


0.4169 


3 


21.85 


0.2575 


[0.1445,0.4587] 


0.4082 


3 


24.35 


0.1173 


[0.0726,0.1896] 


0.5310 


2 
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Suppose the theoretical null /o is one of the Lancaster distributions, that 
is, the exponential families normal, gamma, Poisson or negative binomial 
[Koudou (1998)]. Given two random variables Tj and Tj with correlation p, 
these distributions admit a bivariate model where both Tj and Tj have the 
same marginal density /o and their joint density is given by 

oo n 

(28) foitutf,p)=foiti)foiti) ^Ln{ti)Ln{t,), 

OTl . 

where Ln{t) are the Lancaster orthogonal polynomials with respect to /q: 
Hermite if /o is normal, generalized Laguerre if /o is gamma, Charlier if 
/o is Poisson, and normalized Meixner if /o is negative binomial. In par- 
ticular, when /o is normal, the expansion (28) is known as Mehler's for- 
mula [Patel and Read (1996), Kotz, Balakrishnan and Johnson (2000)] and 
is equal to the standard bivariate normal with correlation coefficient p. 

Theorem 1. Let /o be one of the Lancaster distributions and assume 
that under the complete null every pair of test statistics (Ti,Tj) has a bi- 
variate density given by (28) with marginals foit) and corr(Tj,Tj) = pij. Let 
E{p'^), n = 1, 2, . . . , denote the empirical moments of the N{N — 1) correla- 
tions pij , i < j ■ Then the covariance of the vector of bin counts y is 



(29) cov(y) 
where 



Diag(A) - ^ 



+ (l-i^)Diag(A)(5Diag(A), 



oo 



(30) s = Y,^^Ln{t)Ln{ty 

and Ln{t) denote the Lancaster polynomials evaluated at the vector t. 
When fo{t) is N{p,a'^) then (30) becomes 

n=l 

where ILn{t) are the Hermite polynomials: Ho{t) = 1, Hi(t) = t, H2(t) = 
— 1, and so on. In particular, setting p = 0, a = 1, E{p) = 0, and truncating 
the series at n = 2 gives precisely the result of Efron (2007a), Theorem 1: 



(31) cov(y) 



Diag(A) - — 



+ [l-l]E{p')w2w'^, 



where W2 = Diag(A)i72(i)/\/2 is the "wing function" vector with compo- 
nents uj2,k = ^^foi'tk){'tk — l)/\/2- The above extension to other Hermite 
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polynomial orders is recognized in Remark E of Efron (2007a) but not pre- 
cisely formulated. 

When fo{t) is the a'x^{v) density (11) then (30) becomes 



n\ T{u/2 + n) " \2a) " 



2a 



where L^n'^~^\t) are the generalized Laguerre polynomials of degree v/2 — \: 



L{'/^-^\t) = -t + iy/2 
W2-1), 



L'^''^-'> (t) = t'- 2{v/2 + l)t + {u/2){v/2 + 1) 

and so on. Here there is no reason to assume E{p) = 0. A similar approxi- 
mation to (31) to the first order is 



(32) 
where 
(33) 



cov(y) 



Diag(A) 



+ 1 



1 



N 



E{p)wiw[, 



Wi 



ni«.m M>/2)_.W2-i)/t \ 
^^^^^^Vr(z./2 + i)^i UJ 



is the corresponding first order "wing function" vector with components 
wi,k = NAy'T{iy/2)/T{iy/2 + l)/o(tfc)[tfc/(2a) - u/2]. 

5.1. The DTI data. Recall the permutation estimate Vp of the covari- 
ance matrix cov(y) obtained in Section 2.6. For this dataset, the largest 
eigenvalue of V^at = Diag(A) — X\' /N is only 4.1% of the largest eigenvalue 



0.05 



-0.05, 




Fig. 8. DTI example: First eigenvector of the permutation covariance estimate (blue) 
and first- order "wing function" (green). 
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di of Vp, while the second eigenvalue d2 oi Vp is only 6.6% of di. Thus, 
using (32), we can approximate 

(34) Vp^Eip)w,w[, 

where wi is the first-order "wing function" (33) evaluated using the em- 
pirical null estimates A, u and a from Section 2.5. Figure 8 shows the first 
eigenvector of superimposed with Wi normalized to unit norm, that is, 
wi/llwill with ll'Ji'ilP = X^A—i 'fi'i fc- The similarity between the two is akin 
to the similarity between the 1st eigenvector of the permutation covariance 
estimate and the 2nd order wing function in the normal example described 
in Efron (2007a). 

Given the equivalence (34), one might be tempted to estimate the average 
pairwise correlation between the test statistics as E{p) = diWwiW^ = 0.0057, 
where di is the largest eigenvalue of Vp. This is a strongly conservative 
estimate, as it suffers from the upward sorting bias of di. The true aver- 
age pairwise correlation is probably much lower. This is reassuring as the 
correlation is presumably mostly local in the image domain. 

6. Summary and discussion. In this article I have extended the central 
matching method for estimating the null density in large-scale multiple test- 
ing to a mode matching method, applicable when the theoretical null belongs 
to any exponential family or a related distribution such as t or F. The em- 
pirical null estimate is accompanied by an estimate of po, the proportion 
of true null tests in the data. Further, the empirical null estimates can be 
used directly to estimate local and tail FDR curves for FDR inference. Delta 
method covariance estimates and bias formulas have been derived. We have 
seen that FDR estimates are biased down at the far tails and should be 
taken cautiously whenever the corresponding observed bin counts are small. 
The effect of correlation has been explained by a generalization of Efron's 
"wing function." 

Efron (2005b, 2007b) discusses several reasons why the empirical null may 
not match exactly the theoretical null in observational studies. It should be 
emphasized that mode matching does not necessarily increase power with 
respect to the theoretical null [Efron (2004) provides counterexamples]. In- 
stead, the empirical null answers a question of model validity. 

In the normal case, another empirical null method called MLE fitting 
[Efron (2007b)] has been reported to give similar empirical null estimates 
with slightly lower variance. Mode matching is easier to analyze and is ap- 
pealing because of its application to exponential families. It is also easier 
to implement in practice because of available software and computational 
efficiency. But, in principle, just like mode matching, MLE fitting could be 
extended to other distributions beyond the normal too. 
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At least two aspects of mode matching may benefit from further study 
beyond this paper. One aspect is the possibihty of choosing data-dependent 
hmits for the fitting interval 5o . Fixed limits based on the theoretical null are 
inappropriate precisely because the empirical null is expected to be displaced 
or scaled with respect the theoretical null. For instance, in an analysis of 
X^-scores, Schwartzman, Dougherty and Taylor (2008a) used as upper limit 
the 90th percentile of the empirical test statistic distribution. In the SNP 
data example above, a bootstrap analysis showed that setting the upper 
limit to the 99.95th percentile of the test statistic distribution, rather than 
the 99.95th percentile of the theoretical x^i^) density (the value 20 used 
previously), results in empirical null variance estimates that are about 50% 
higher than those obtained when the limit was fixed. This suggests that 
the cost of a data-dependent limit might not be too high. Unfortunately, as 
shown in Section 4 above, the choice of the limit depends very much on the 
alternative distribution and the null proportion, both of which are unknown. 

Another aspect is the possibility of using the two-step approach of Efron 
(2007b) of estimating the mixture density nonparametrically before fitting 
the empirical null by mode matching. As noted above, the most crucial issue 
is the bias in the tails of the density. Future exploration may yield an answer 
to what is the best way to estimate the mixture density for mode matching. 
For example, different bin widths A could be used inside and outside the 
fitting interval Sq since they serve different purposes. Inside 5*0 one could 
optimize A for empirical null estimation, while outside 5o one could optimize 
A for FDR estimation. 

Matlab functions implementing the methods described in this paper are 
available at htt p : / /biowww. dfci . harvard . edu/ ~ ar min/ software . ht ml . 



A.l. Normal family. The empirical null N{fi,a'^) with 0= {fi,a^) has 
exponential family form 



APPENDIX A: SPECIAL CASES 








1 



The Poisson regression (8) using t and t"^ as predictors and log{NA/\/^) 
as offset gives estimates 172 and C. From these we obtain 
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The parameter derivative matrix D required for computing the covariance 
(15) of e+ = {\ogpQ,fi,^^)' is 



D 



d{f]+y 



V 



Vi jjl \_\ 

2f/2 4ry| 2f/2 

2772 





2vl 
1 



■ 1 fi + d^' 
2a^ 



The normal family N{fi,a'^) lends itself to two exponential subfamilies. 



A. 1.1. Estimate 9 = fi. The empirical null N^^^Uq) with fixed (Tq has 
the exponential family form x{t) = t, r] = /i/ciQ, ^(r?) = a'^'tf' jl and (70(0 = 
g-t2/(2o-g) y 27r(TQ. Poisson regression using t as predictor and log(A^A(7o(i)) 
as offset gives estimates f] and C. From these we obtain 

A=-g^, iogpo=c+^, D=(J ;2)- 

A. 1.2. Estimate 9 = . The empirical null N{fj,Q,a'^) with fixed /Uq has 
the exponential family form x{t) = (t — /io)^, '7 = — l/(2c2), V'(^) = (~l/2) x 
log(— 2?7) and go(t) = l/-v/27r. Poisson regression using (t — 7x0)^ as predictor 
and \og{N A / \/^tt) as offset gives estimates fj and C. Prom these we obtain 

^' = -^, logpo = (7-ilog(-277), D=(l 2^4)- 

A. 2. Scaled family (Gamma). The empirical null ax^(z^) (11) with 
9 = (a, u)' has exponential family form 

x{t) = {t,iogty, ^(^) = iog(^Ii|±i).^ 

= (m,??2)'= (^-^,^ - 1) , 9o(i) = l- 

The Poisson regression (8) using t and logt as predictors gives estimates f/i, 
572 and C. From these we obtain 

a = --^, z> = 2(7/2 + 1), logpo = C' + V'(r))- 
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The parameter derivative matrix D required for computing the covariance 
(15) of 6^ = (logpo;«)i^)' is 



(l ^(^2 + l)-log(-r/l)^ 

^= ^ 

VO 2 

1 au ^'(z)/2) +log(2a)~ 
2a^ 
.0 2 , 

where ^{z) = {d/ dz)\ogT{z) is the Digamma function. The scaled family 
lends itself to two exponential subfamilies. 



A. 2.1. Estimate 9 = a. The empirical null ax^(z^o) with fixed z/q has ex- 
ponential family form x{t) =t, rj = — l/(2a), ipirf) = — log(— ry) and 
gQ{t) = t'^°/'^~'^ /T{vq/2). Poisson regression using t as a predictor and \og{N A.gQ{t)) 

as offset gives estimates fj and C. Prom these we obtain 



2r] 



n ^0 1 
C--10, 



1 afo\ 
2d^) 



A. 2. 2. Estimate 6 = u. The empirical null aoX^(^) with fixed has 
exponential family form x{t) = logi, r/ = 1^/2 — 1, 'ip{r]) = logP(77 + 1) + (77 + 
1) log(2ao) and goit) = e~*/''^"°\ Poisson regression using logt as a predictor 
and log(A^A(7o(i)) as offset gives estimates fj and C. From these we obtain 



i> = 2(77 + 1), logpo = C + V(??), 



1 ^(z)/2)+log(2ao) 
2 



APPENDIX B: PROOFS 



Proof of Proposition 1. The score equation for the Poisson regres- 
sion (8) including the external weights W is 



(35) 



X'W[y - exp(X^+ + h)] = 0. 



The rate of change of the MLE vector t)"*" with respect to the count vector 
y, considered as continuous, is 



(36) 



dy' 



{x'wvxy^x'w, 
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obtained by differentiating (35) witli respect to y and replacing (9). Condi- 
tional on the covariance estimate of y is V m- Thus, the delta method 
covariance estimate of f]^ is {dT]~^ /dy')V ^{df]^ /dy')' , yielding (14). 
The rate of change of 6^ with respect to r}^ is 

n- _ d{\ogp,,e{rjyy _(i ^{rjy 
^ ^ di-n+y d{c,v') vo eirjy 

so the rate of change of 0^ with respect to fj^ at i) is D = D{ii). The delta 
method covariance estimate of 6^ is {dO^ / d{f}^y) cBv{f}^){d6^ / d{f}^yy , 
yielding (15). □ 

Proof of Proposition 2. The rate of change (16) of the vector logy = 
Xfj~^ with respect to y follows directly by (36). By the chain rule, dy/dy' = 
[dy / d{\ogyy][d{\ogy) / dy'] = VDy, and similarly, d{y — y)/dy' = I — VDy. 
The result follows by the delta method. □ 

Proof of Proposition 3. (a) Follows immediately by the delta method 
and the definition of A. 

(b) To evaluate the rate of change of the vector log(Sy) with respect to 
y, compute 

d{\ogiSy))k _ d . (l . #^ . \ _ ife + Sjlfc+ife 
dy, - % I, 2^^ + - ly, + Ef...r h • 



Thus, 



dy' dy' dy' 

where we have used the fact that d(logyk)/dyi = {l/yk)dyk/dyi. The result 
now follows by the delta method and the definition of B. □ 

Proof of Proposition 4. Dividing the score equation (35) by 7A and 
applying the law of large numbers as 7 — > 00 gives that r/^ converges to the 
solution f]^ of the equation 

(38) X'Wipofo + (1 - po)/a - Diag(5o(t)) exp(Xr)+ )] = 0. 

In particular, if po = 1; we have that r)"*" is asymptotically unbiased, that is, 

^00 = (0) ''?')'• The idea is to find a first order expansion of r)"*" near pQ = 1. 

Differentiating (38) with respect to po, we obtain that, at po = 1, df]'^/dpo = 

(X'WDiag(/o)X)'iX'W(/o - f^). The bias in the estimation of r/+ is 

approximately 

f)to-v^ = 'hto-{o,v'y-iiog{po),o'y 



dpo 



(po-l)-(log(po),0' 

Po=l 
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yielding (26). Similarly, the bias in the estimation of 6^ is a first order 
expansion of 6^ with respect to rj"*" near po = 1. □ 



Proof of Proposition 5. (a) For known po and /o, (18) says fdr^ 
Afc,o/2/fc, where yk ~ -Po(Afc), = jAf{tk) and Afc,o = 7Apo/o(tfc)- Thus, 



E[fdrfc|yfc>0]=E 



Afc,i 



Vk 



/ >^k \yk 



yfe >0 =fdrfc-C(Afc), 



where C('^) is defined for a generic y Po{\) as C('^) = E(A/y|y > 0). By 
direct evaluation, 



C(A) 



1 



A e-^A^ 



1 - e-^ ^ j ?! 



~f^J^- 



A 



3^-1 



E— r^^ 



J! 



which is equal to (27). 

(b) When the empirical null is used, the local fdr estimate (18) is fdr^ = 
Vk/Uk, where = iVApo/o(*fc)- Notice that when evaluated at ^ Sq, the 
numerator yk is independent of the denominator y^- Thus, 



E[£ik\yk>0]=E( 



m 
Kvk 



yk>o 



Afc,oE(yfc)j,/^ Afc 



Xk A 



fc,0 



Vk 



yk>0] ?»fdrfe6fcC(Afe), 



where the vector &oo is obtained as follows. By (8) and (9), Aq = exp(Xr/+ + 
h) and E(y) = E[exp(X77+ + h)] exp{Xfi^ + h) as ^ oo. Dividing 
entry by entry gives boo = exp(X(^+ — r/"*")). □ 



Proof of Theorem 1. The form of expression (29) follows from Efron 
(2007a), Lemma 1. A similar argument as in Efron (2007a), Lemma 2, gives 
that the entries of S are 

(39) 6ki « /' Rkiip) dGip), RM = lf':fu\ - 1- 

Replacing (28) in (39) gives (30). □ 
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